Search CORE

9 research outputs found

Towards measuring the complexity of introducing semantics into a company

Author: Alvarez de Mon Rego Inmaculada
Barbosa Santillán Liliana Ibeth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

The Semantics Difficulty Model (SDM) is a model that measures the difficult of introducing semantics technology into a company. SDM manages three descriptions of stages, which we will refer to as ?snapshots?: a company semantic snapshot, data snapshot and semantic application snapshot. Understanding a priory the complexity of introducing semantics into a company is important because it allows the organization to take early decisions, thus saving time and money, mitigating risks and improving innovation, time to market and productivity. SDM works by measuring the distance between each initial snapshot and its reference models (the company semantic snapshots reference model, data snapshots reference model, and the semantic application snapshots reference model) with Euclidian distances. The difficulty level will be "not at all difficult" when the distance is small, and becomes "extremely difficult" when the the distance is large. SDM has been tested experimentally with 2000 simulated companies with arrangements and several initial stages. The output is measured by five linguistic values: "not at all difficult, slightly difficult, averagely difficult, very difficult and extremely difficult". As the preliminary results of our SDM simulation model indicate, transforming a search application into integrated data from different sources with semantics is a "slightly difficult", in contrast with data and opinion extraction applications for which it is "very difficult"

Crossref

Archivo Digital UPM

Is the polarity of content producers strongly influenced by the results of the event?

Author: Alvarez de Mon Rego Inmaculada
Barbosa Santillán Liliana Ibeth
Publication venue: E.T.S.I y Sistemas de Telecomunicación (UPM)
Publication date: 01/01/2013
Field of study

This paper presents an approach to compare two types of data, subjective data (Polarity of Pan American Games 2011 event by country) and objective data (the number of medals won by each participating country), based on the Pearson corre- lation. When dealing with events described by people, knowledge acquisition is difficult because their structure is heterogeneous and subjective. A first step towards knowing the polarity of the information provided by people consists in automatically classifying the posts into clusters according to their polarity. The authors carried out a set of experiments using a corpus that consists of 5600 posts extracted from 168 Internet resources related to a specific event: the 2011 Pan American games. The approach is based on four components: a crawler, a filter, a synthesizer and a polarity analyzer. The PanAmerican approach automatically classifies the polarity of the event into clusters with the following results: 588 positive, 336 neutral, and 76 negative. Our work found out that the polarity of the content produced was strongly influenced by the results of the event with a correlation of .74. Thus, it is possible to conclude that the polarity of content is strongly affected by the results of the event. Finally, the accuracy of the PanAmerican approach is: .87, .90, and .80 according to the precision of the three classes of polarity evaluated

Archivo Digital UPM

The unified sentiment lexicon using GPUs

Author: Alvarez de Mon Rego Inmaculada
Barbosa Santillán Liliana Ibeth
Publication venue: E.T.S.I y Sistemas de Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

This approach aims at aligning, unifying and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. A sentiment lexicon is a critical and essential resource for tagging subjective corpora on the web or elsewhere. In many situations, the multilingual property of the sentiment lexicon is important because the writer is using two languages alternately in the same text, message or post. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and the UnifiedMetrics procedure for CPU and GPU, respectively

Archivo Digital UPM

Towards a unified sentiment lexicon based on graphics processing units

Author: Alvarez de Mon Rego Inmaculada
Barbosa Santillán Liliana Ibeth
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL). This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P, Z, N} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Red Mexicana de Repositorios Institucionales

Directory of Open Access Journals

Archivo Digital UPM

The Spanish Travel Subjective Lexicon (STSL).

Author: Alvarez De Mon Rego Inmaculada
Barbosa Santillán Liliana Ibeth
Rodríguez Villareal Mercedes
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/01/2013
Field of study

This paper presents a proposal for a recognition model for the appraisal value of sentences. It is based on splitting the text into independent sentences (full stops) and then analysing the appraisal elements contained in each sentence according to the previous value in the appraisal lexicon. In this lexicon, positive words are assigned a positive coefficient (+1) and negative words a negative coefficient (-1). We take into account word such as ?too?, ?little? (when it is not ?a bit?), ?less?, and ?nothing? than can modify the polarity degree of lexical unit when appear in the nearby environment. If any of these elements are present, then the previous coefficient will be multiplied by (-1), that is, they will change their sign. Our results show a nearly theoretical effectiveness of 90%, despite not achieving the recognition (or misrecognition) of implicit elements. These elements represent approximately 4% of the total of sentences analysed for appraisal and include the errors in the recognition of coordinated sentences. On the one hand, we found that 3.6 % of the sentences could not be recognized because they use different connectors than those included in the model; on the other hand, we found that in 8.6% of the sentences despite using some of the described connectors could not be applied the rules we have developed. The percentage relative to the whole group of appraisal sentences in the corpus was approximately of 5%

Archivo Digital UPM

A Semantic web page linguistic annotation model

Author: Aguado de Cea Guadalupe
Alvarez de Mon Rego Inmaculada
Gómez-Pérez A.
Pareja-Lora A.
Plaza-Arteche Rosario
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2002
Field of study

Although with the Semantic Web initiative much research on web page semantic annotation has already been done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. The purpose of the research presented in this proposal is to prove that integration of results in both fields is not only possible, but also highly useful in order to make Semantic Web pages more machine-readable. A multi-level (possibly multi-purpose and multi-language) annotation model based on EAGLES standards and Ontological Semantics, implemented with last generation Semantic Web languages is being developed to fit the needs of both communities

Archivo Digital UPM